library(tidyverse)
library(knitr)
library(broom)
library(stringr)
library(modelr)
library(forcats)
library(ggmap)
library(plotly)
options(digits = 3)
set.seed(1234)
theme_set(theme_minimal())Geospatial visualizations are some of the oldest data visualization methods in human existence. Data maps were first popularized in the seventeenth century and have grown in complexity and detail since then. Consider Google Maps, the sheer volume of data depicted, and the analytical pathways available to its users.
Geometric visualizations are used to depict spatial features, and with the incorporation of data reveal additional attributes and information. The main features of a map are defined by its scale (the proportion between distances and sizes on the map), its projection (how the three-dimensional Earth is represented on a two-dimensional surface), and its symbols (how data is depicted and visualized on the map).1
Drawing maps in R is a layer-like process. Typically you start by drawing the boundaries of the geographic regions you wish to visualize, then you add additional layers defined by other variables:
A geographic information system (GIS) is software that is “designed to capture, store, manipulate, analyze, manage, and present spatial or geographic data”.2 There are many specialized software packages for spatial data analysis, many of which are commercial or proprietary software. For serious spatial data analysis tasks, you probably want to learn how to use these products. However for casual usage, R has a number of tools for drawing maps, most importantly ggplot2.
The maps package includes the map() function for drawing maps based on bundled geodatabases using the graphics package. maps includes basic boundary files for:
We can extract the boundary files using map_data(), which converts the data object to a data frame:
library(maps)
# world
map_data("world") %>%
as_tibble()## # A tibble: 99,338 x 6
## long lat group order region subregion
## * <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 -69.9 12.5 1. 1 Aruba <NA>
## 2 -69.9 12.4 1. 2 Aruba <NA>
## 3 -69.9 12.4 1. 3 Aruba <NA>
## 4 -70.0 12.5 1. 4 Aruba <NA>
## 5 -70.1 12.5 1. 5 Aruba <NA>
## 6 -70.1 12.6 1. 6 Aruba <NA>
## 7 -70.0 12.6 1. 7 Aruba <NA>
## 8 -70.0 12.6 1. 8 Aruba <NA>
## 9 -69.9 12.5 1. 9 Aruba <NA>
## 10 -69.9 12.5 1. 10 Aruba <NA>
## # ... with 99,328 more rows
# usa
map_data("usa") %>%
as_tibble()## # A tibble: 7,243 x 6
## long lat group order region subregion
## * <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 -101. 29.7 1. 1 main <NA>
## 2 -101. 29.7 1. 2 main <NA>
## 3 -101. 29.7 1. 3 main <NA>
## 4 -101. 29.6 1. 4 main <NA>
## 5 -101. 29.6 1. 5 main <NA>
## 6 -101. 29.6 1. 6 main <NA>
## 7 -101. 29.6 1. 7 main <NA>
## 8 -101. 29.6 1. 8 main <NA>
## 9 -101. 29.6 1. 9 main <NA>
## 10 -101. 29.6 1. 10 main <NA>
## # ... with 7,233 more rows
# states
map_data("state") %>%
as_tibble()## # A tibble: 15,537 x 6
## long lat group order region subregion
## * <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 -87.5 30.4 1. 1 alabama <NA>
## 2 -87.5 30.4 1. 2 alabama <NA>
## 3 -87.5 30.4 1. 3 alabama <NA>
## 4 -87.5 30.3 1. 4 alabama <NA>
## 5 -87.6 30.3 1. 5 alabama <NA>
## 6 -87.6 30.3 1. 6 alabama <NA>
## 7 -87.6 30.3 1. 7 alabama <NA>
## 8 -87.6 30.3 1. 8 alabama <NA>
## 9 -87.7 30.3 1. 9 alabama <NA>
## 10 -87.8 30.3 1. 10 alabama <NA>
## # ... with 15,527 more rows
map_data() converts the geodatabases into data frames where each row is a single point on the map. The resulting data frame contains the following variables:
long - longitude. Things to the west of the prime meridian are negativelat - latitudeorder - this identifies the order ggplot() should follow to “connect the dots” and draw the bordersregion and subregion identify what region or subregion a set of points surroundsgroup - this is perhaps the most important variable in the data frame. ggplot() uses the group aesthetic to determine whether adjacent points should be connected by a line. If they are in the same group, then the points are connected. If not, then they are not connected. This is the same basic principle for standard geom_line() plots:
library(gapminder)
# no group aesthetic
ggplot(gapminder, aes(year, lifeExp)) +
geom_line()# with grouping by country
ggplot(gapminder, aes(year, lifeExp, group = country)) +
geom_line()Note that group is not the same thing as region or subregion. If a region contains landmasses that are discontiguous, there should be multiple groups to properly draw the region:
map("state", "michigan")maps contains a very limited number of geodatabases. If you want to import a different country’s borders or some other geographic information, you will likely find the data in a shapefile. This is a special file format that encodes points, lines, and polygons in geographic space. Files appear with a .shp extension, sometimes with accompanying files ending in .dbf and .prj.
.shp stores the geographic coordinates of the geographic features (e.g. country, state, county).dbf stores data associated with the geographic features (e.g. unemployment rate, crime rates, percentage of votes cast for Donald Trump).prj stores information about the projection of the coordinates in the shapefile (we’ll handle this shortly)There are a crap ton of packages for R that allow you to interact with shapefiles and spatial data. Let’s start with a shapefile for state boundaries in the United States.3 We’ll use readOGR() from the rgdal package to read in the data file:
library(rgdal)
usa <- readOGR("data/census_bureau/cb_2013_us_state_20m/cb_2013_us_state_20m.shp")## OGR data source with driver: ESRI Shapefile
## Source: "/Users/soltoffbc/Projects/Data Visualization/dataviz/notes/data/census_bureau/cb_2013_us_state_20m/cb_2013_us_state_20m.shp", layer: "cb_2013_us_state_20m"
## with 52 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
str(usa, max.level = 2)## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
## ..@ data :'data.frame': 52 obs. of 9 variables:
## ..@ polygons :List of 52
## ..@ plotOrder : int [1:52] 32 48 29 3 18 46 33 22 34 51 ...
## ..@ bbox : num [1:2, 1:2] -179.1 17.9 179.8 71.4
## .. ..- attr(*, "dimnames")=List of 2
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
This is decidedly not a tidy data frame. Once you import the shapefile, it’s best to convert it to a data frame for ggplot(). We can do this using fortify():
usa %>%
fortify() %>%
head()## long lat order hole piece id group
## 1 -88.5 31.9 1 FALSE 1 0 0.1
## 2 -88.5 31.9 2 FALSE 1 0 0.1
## 3 -88.5 31.9 3 FALSE 1 0 0.1
## 4 -88.5 31.9 4 FALSE 1 0 0.1
## 5 -88.5 32.0 5 FALSE 1 0 0.1
## 6 -88.5 32.0 6 FALSE 1 0 0.1
Under this approach, the id variable is just a number assigned to each region (in this case, each state/territory). However the shapefile contains linked data with attributes for each region. We can access this using the @data accessor:
usa@data %>%
as_tibble## # A tibble: 52 x 9
## STATEFP STATENS AFFGEOID GEOID STUSPS NAME LSAD ALAND AWATER
## * <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 01 01779775 0400000US01 01 AL Alabama 00 131172… 459492…
## 2 05 00068085 0400000US05 05 AR Arkans… 00 134772… 295881…
## 3 06 01779778 0400000US06 06 CA Califo… 00 403482… 204843…
## 4 09 01779780 0400000US09 09 CT Connec… 00 125419… 181540…
## 5 12 00294478 0400000US12 12 FL Florida 00 138897… 314136…
## 6 13 01705317 0400000US13 13 GA Georgia 00 148962… 494780…
## 7 16 01779783 0400000US16 16 ID Idaho 00 214045… 239773…
## 8 17 01779784 0400000US17 17 IL Illino… 00 143793… 620168…
## 9 18 00448508 0400000US18 18 IN Indiana 00 927895… 153667…
## 10 20 00481813 0400000US20 20 KS Kansas 00 211752… 134671…
## # ... with 42 more rows
We can keep these variables in the new data frame through parameters to fortify(region = ""):
# state name
usa %>%
fortify(region = "NAME") %>%
head## long lat order hole piece id group
## 1 -88.5 31.9 1 FALSE 1 Alabama Alabama.1
## 2 -88.5 31.9 2 FALSE 1 Alabama Alabama.1
## 3 -88.5 31.9 3 FALSE 1 Alabama Alabama.1
## 4 -88.5 31.9 4 FALSE 1 Alabama Alabama.1
## 5 -88.5 32.0 5 FALSE 1 Alabama Alabama.1
## 6 -88.5 32.0 6 FALSE 1 Alabama Alabama.1
# FIPS code
usa %>%
fortify(region = "STATEFP") %>%
head## long lat order hole piece id group
## 1 -88.5 31.9 1 FALSE 1 01 01.1
## 2 -88.5 31.9 2 FALSE 1 01 01.1
## 3 -88.5 31.9 3 FALSE 1 01 01.1
## 4 -88.5 31.9 4 FALSE 1 01 01.1
## 5 -88.5 32.0 5 FALSE 1 01 01.1
## 6 -88.5 32.0 6 FALSE 1 01 01.1
# keep it all
(usa2 <- usa %>%
fortify(region = "NAME") %>%
as_tibble %>%
left_join(usa@data, by = c("id" = "NAME")))## # A tibble: 46,174 x 15
## long lat order hole piece id group STATEFP STATENS AFFGEOID
## <dbl> <dbl> <int> <lgl> <fct> <chr> <fct> <fct> <fct> <fct>
## 1 -88.5 31.9 1 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## 2 -88.5 31.9 2 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## 3 -88.5 31.9 3 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## 4 -88.5 31.9 4 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## 5 -88.5 32.0 5 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## 6 -88.5 32.0 6 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## 7 -88.5 32.1 7 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## 8 -88.4 32.2 8 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## 9 -88.4 32.2 9 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## 10 -88.4 32.2 10 FALSE 1 Alaba… Alabam… 01 01779775 0400000U…
## # ... with 46,164 more rows, and 5 more variables: GEOID <fct>,
## # STUSPS <fct>, LSAD <fct>, ALAND <fct>, AWATER <fct>
Simple features or simple feature access refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects. It also describes how such objects can be stored in and retrieved from databases, and which geometrical operations should be defined for them.
The standard is widely implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS) and forms the vector data basis for libraries such as GDAL. A subset of simple features forms the GeoJSON standard.
R has well-supported classes for storing spatial data (sp) and interfacing to the above mentioned environments (rgdal, rgeos), but has so far lacked a complete implementation of simple features, making conversions at times convoluted, inefficient or incomplete. The package sf tries to fill this gap, and aims at succeeding sp in the long term.
All geometries are composed of points. Points are coordinates in a 2-, 3- or 4-dimensional space. All points in a geometry have the same dimensionality. In addition to X and Y coordinates, there are two optional additional dimensions:
The four possible cases then are:
The following seven simple feature types are the most common, and are for instance the only ones used for GeoJSON:
| type | description |
|---|---|
POINT |
zero-dimensional geometry containing a single point |
LINESTRING |
sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry |
POLYGON |
geometry with a positive area (two-dimensional); sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring |
MULTIPOINT |
set of points; a MULTIPOINT is simple if no two Points in the MULTIPOINT are equal |
MULTILINESTRING |
set of linestrings |
MULTIPOLYGON |
set of polygons |
GEOMETRYCOLLECTION |
set of geometries of any type except GEOMETRYCOLLECTION |
Coordinates can only be placed on the Earth’s surface when their coordinate reference system (CRS) is known; this may be an spheroid CRS such as WGS84, a projected, two-dimensional (Cartesian) CRS such as a UTM zone or Web Mercator, or a CRS in three-dimensions, or including time. Similarly, M-coordinates need an attribute reference system, e.g. a measurement unit.
sf stores simple features as basic R data structures (lists, matrix, vectors, etc.)sf, the table (data.frame) with feature attributes and feature geometries, which containssfc, the list-column with the geometries for each feature (record), which is composed ofsfg, the feature geometry of an individual simple featurest_read(0 imports a shapefile and converts it to a simple feature data frame):
library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))## Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.4/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -84.3 ymin: 33.9 xmax: -75.5 ymax: 36.6
## epsg (SRID): 4267
## proj4string: +proj=longlat +datum=NAD27 +no_defs
The short report printed gives the file name, the driver (ESRI Shapefile), mentions that there are 100 features (records, represented as rows) and 14 fields (attributes, represented as columns). This object is of class
class(nc)## [1] "sf" "data.frame"
meaning it extends (and “is” a) data.frame, but with a single list-column with geometries, which is held in the column with name
attr(nc, "sf_column")## [1] "geometry"
If we print the first three features, we see their attribute values and an abridged version of the geometry
print(nc[9:15], n = 3)## Simple feature collection with 100 features and 6 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -84.3 ymin: 33.9 xmax: -75.5 ymax: 36.6
## epsg (SRID): 4267
## proj4string: +proj=longlat +datum=NAD27 +no_defs
## First 3 features:
## BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 geometry
## 1 1091 1 10 1364 0 19 MULTIPOLYGON (((-81.5 36.2,...
## 2 487 0 10 542 3 12 MULTIPOLYGON (((-81.2 36.4,...
## 3 3188 5 208 3616 6 260 MULTIPOLYGON (((-80.5 36.2,...
In the output we see:
data.frame row, consisting of attributes and geometrygeometry column is a simple feature list-column (an object of class sfc, which is a column in the data.frame)geometry is a single simple feature geometry (an object of class sfg)mapsThe maps package includes the map() function for drawing maps based on bundled geodatabases using the graphics package:
# map of the world
map()# usa boundaries
map("usa")map("state")# county map of illinois
map("county", "illinois")These are fine, but we’d rather use them with our friendly ggplot2 library. We can do this by converting the geodatabases into data frames for plotting with ggplot2 using the map_data() function we saw before.
ggplot2map_data()Let’s draw a map of the United States. First we need to store the USA map boundaries in a data frame:
usa <- map_data("usa") %>%
as_tibble
usa## # A tibble: 7,243 x 6
## long lat group order region subregion
## * <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 -101. 29.7 1. 1 main <NA>
## 2 -101. 29.7 1. 2 main <NA>
## 3 -101. 29.7 1. 3 main <NA>
## 4 -101. 29.6 1. 4 main <NA>
## 5 -101. 29.6 1. 5 main <NA>
## 6 -101. 29.6 1. 6 main <NA>
## 7 -101. 29.6 1. 7 main <NA>
## 8 -101. 29.6 1. 8 main <NA>
## 9 -101. 29.6 1. 9 main <NA>
## 10 -101. 29.6 1. 10 main <NA>
## # ... with 7,233 more rows
geom_polygon() to draw lines between points and close them up (connect the last point with the first point)x = long and y = latgroup aesthetic to the group columnggplot() +
geom_polygon(data = usa, aes(x = long, y = lat, group = group))Ta-da! A few things we want to immediately start thinking about. First, because latitude and longitude have absolute relations to one another, we need to fix the aspect ratio so that we don’t accidentially compress the graph in one dimension. Fixing the aspect ratio also means that even if you change the outer dimensions of the graph (i.e. adjust the window size), then the aspect ratio of the graph itself remains unchanged. We can do this using the coord_fixed() function:
ggplot() +
geom_polygon(data = usa, aes(x = long, y = lat, group = group)) +
coord_fixed()Now it looks a little squished. You can play around with the aspect ratio to find a better projection:4
ggplot() +
geom_polygon(data = usa, aes(x = long, y = lat, group = group)) +
coord_fixed(1.3)Since this is a ggplot() object, we can change the fill and color aesthetics for the map:
ggplot() +
geom_polygon(data = usa, aes(x = long, y = lat, group = group),
fill = NA, color = "red") +
coord_fixed(1.3)gg1 <- ggplot() +
geom_polygon(data = usa, aes(x = long, y = lat, group = group),
fill = "violet", color = "blue") +
coord_fixed(1.3)
gg1What happens if we plot the map without the group aesthetic?
ggplot() +
geom_polygon(data = usa, aes(x = long, y = lat),
fill = "violet", color = "blue") +
coord_fixed(1.3)Oops. The map doesn’t connect the dots in the correct order.
maps also comes with a state boundaries geodatabase:
states <- map_data("state") %>%
as_tibble()
states## # A tibble: 15,537 x 6
## long lat group order region subregion
## * <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 -87.5 30.4 1. 1 alabama <NA>
## 2 -87.5 30.4 1. 2 alabama <NA>
## 3 -87.5 30.4 1. 3 alabama <NA>
## 4 -87.5 30.3 1. 4 alabama <NA>
## 5 -87.6 30.3 1. 5 alabama <NA>
## 6 -87.6 30.3 1. 6 alabama <NA>
## 7 -87.6 30.3 1. 7 alabama <NA>
## 8 -87.6 30.3 1. 8 alabama <NA>
## 9 -87.7 30.3 1. 9 alabama <NA>
## 10 -87.8 30.3 1. 10 alabama <NA>
## # ... with 15,527 more rows
By default, each state is filled with the same color:
ggplot(data = states) +
geom_polygon(aes(x = long, y = lat, group = group), color = "white") +
coord_fixed(1.3)We can adjust this by using the fill aesthetic. Here, let’s map region to fill:
ggplot(data = states) +
geom_polygon(aes(x = long, y = lat, fill = region, group = group), color = "white") +
coord_fixed(1.3) +
# turn off color legend
theme(legend.position = "none")Here, each state is assigned a different color at random. You can start to imagine how we might build a choropleth by mapping a different variable to fill, but we’ll return to that in a little bit.
We can use filter() to subset the states data frame and draw a map with only a subset of the states. For example, if we want to only graph states in the Midwest:
midwest <- subset(states, region %in% c("illinois", "indiana", "iowa",
"kansas", "michigan", "minnesota",
"missouri", "nebraska", "north dakota",
"ohio", "south dakota", "wisconsin"))
ggplot(data = midwest) +
geom_polygon(aes(x = long, y = lat, group = group),
fill = "palegreen", color = "black") +
coord_fixed(1.3)First let’s get the Illinois boundaries:
il_df <- filter(states, region == "illinois")Now let’s get the accompanying counties:
counties <- map_data("county") %>%
as_tibble
il_county <- filter(counties, region == "illinois")
il_county## # A tibble: 1,696 x 6
## long lat group order region subregion
## <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 -91.5 40.2 561. 21883 illinois adams
## 2 -90.9 40.2 561. 21884 illinois adams
## 3 -90.9 40.1 561. 21885 illinois adams
## 4 -90.9 39.8 561. 21886 illinois adams
## 5 -90.9 39.8 561. 21887 illinois adams
## 6 -91.4 39.8 561. 21888 illinois adams
## 7 -91.4 39.8 561. 21889 illinois adams
## 8 -91.4 39.9 561. 21890 illinois adams
## 9 -91.5 39.9 561. 21891 illinois adams
## 10 -91.4 39.9 561. 21892 illinois adams
## # ... with 1,686 more rows
Plot the state first. This time, lets’ remove all the axes gridlines and background junk using theme_void():
il_base <- ggplot(data = il_df, mapping = aes(x = long, y = lat, group = group)) +
coord_fixed(1.3) +
geom_polygon(color = "black", fill = "gray")
il_base +
theme_void()Now let’s plot the county boundaries in white:
il_base +
theme_void() +
geom_polygon(data = il_county, fill = NA, color = "white") +
geom_polygon(color = "black", fill = NA) # get the state border back on topIf you were observant, you noticed map_data("states") only includes the 48 contiguous states in the United States. This is because Alaska and Hawaii exist far off from the rest of the states. What happens if you try to draw a map with them in it?
Yup, that doesn’t look right. Most maps of the United States place Alaska and Hawaii as insets to the south of California. Until recently, in R this was an extremely tedious task that required manually changing the latitude and longitude coordinates for these states to place them in the correct location. Fortunately a new package is available that has already done the work for you. fiftystater includes the fifty_states data frame which contains adjusted coordinates for Alaska and Hawaii to plot them with the mainland:
library(fiftystater)
data("fifty_states")
fifty_states %>%
as_tibble## # A tibble: 13,694 x 7
## long lat order hole piece id group
## <dbl> <dbl> <int> <lgl> <fct> <chr> <fct>
## 1 -85.1 32.0 1 FALSE 1 alabama Alabama.1
## 2 -85.1 31.9 2 FALSE 1 alabama Alabama.1
## 3 -85.1 31.9 3 FALSE 1 alabama Alabama.1
## 4 -85.1 31.8 4 FALSE 1 alabama Alabama.1
## 5 -85.1 31.8 5 FALSE 1 alabama Alabama.1
## 6 -85.1 31.7 6 FALSE 1 alabama Alabama.1
## 7 -85.1 31.7 7 FALSE 1 alabama Alabama.1
## 8 -85.1 31.7 8 FALSE 1 alabama Alabama.1
## 9 -85.1 31.6 9 FALSE 1 alabama Alabama.1
## 10 -85.0 31.6 10 FALSE 1 alabama Alabama.1
## # ... with 13,684 more rows
ggplot(data = fifty_states, mapping = aes(x = long, y = lat, group = group)) +
coord_fixed(1.3) +
geom_polygon(color = "black", fill = "gray")Recall how to import a shapefile and convert it to a data frame. Once we have it in that form:
glimpse(usa2)## Observations: 46,174
## Variables: 15
## $ long <dbl> -88.5, -88.5, -88.5, -88.5, -88.5, -88.5, -88.5, -88....
## $ lat <dbl> 31.9, 31.9, 31.9, 31.9, 32.0, 32.0, 32.1, 32.2, 32.2,...
## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16...
## $ hole <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS...
## $ piece <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ id <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama"...
## $ group <fct> Alabama.1, Alabama.1, Alabama.1, Alabama.1, Alabama.1...
## $ STATEFP <fct> 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 0...
## $ STATENS <fct> 01779775, 01779775, 01779775, 01779775, 01779775, 017...
## $ AFFGEOID <fct> 0400000US01, 0400000US01, 0400000US01, 0400000US01, 0...
## $ GEOID <fct> 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 0...
## $ STUSPS <fct> AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, A...
## $ LSAD <fct> 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 0...
## $ ALAND <fct> 131172434095, 131172434095, 131172434095, 13117243409...
## $ AWATER <fct> 4594920201, 4594920201, 4594920201, 4594920201, 45949...
Now we can plot it like normal using ggplot():
ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) +
coord_fixed(1.3) +
geom_polygon(color = "black", fill = "gray")Because the data file comes from the Census Bureau, we also get boundaries for Alaska, Hawaii, and Puerto Rico. To remove them from the data, just use filter():
usa2 <- usa2 %>%
filter(id != "Alaska", id != "Hawaii", id != "Puerto Rico")
ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) +
coord_fixed(1.3) +
geom_polygon(color = "black", fill = "gray")sf objectRecall the North Carolina simple features dataset:
glimpse(nc)## Observations: 100
## Variables: 15
## $ AREA <dbl> 0.114, 0.061, 0.143, 0.070, 0.153, 0.097, 0.062, 0.0...
## $ PERIMETER <dbl> 1.44, 1.23, 1.63, 2.97, 2.21, 1.67, 1.55, 1.28, 1.42...
## $ CNTY_ <dbl> 1825, 1827, 1828, 1831, 1832, 1833, 1834, 1835, 1836...
## $ CNTY_ID <dbl> 1825, 1827, 1828, 1831, 1832, 1833, 1834, 1835, 1836...
## $ NAME <fct> Ashe, Alleghany, Surry, Currituck, Northampton, Hert...
## $ FIPS <fct> 37009, 37005, 37171, 37053, 37131, 37091, 37029, 370...
## $ FIPSNO <dbl> 37009, 37005, 37171, 37053, 37131, 37091, 37029, 370...
## $ CRESS_ID <int> 5, 3, 86, 27, 66, 46, 15, 37, 93, 85, 17, 79, 39, 73...
## $ BIR74 <dbl> 1091, 487, 3188, 508, 1421, 1452, 286, 420, 968, 161...
## $ SID74 <dbl> 1, 0, 5, 1, 9, 7, 0, 0, 4, 1, 2, 16, 4, 4, 4, 18, 3,...
## $ NWBIR74 <dbl> 10, 10, 208, 123, 1066, 954, 115, 254, 748, 160, 550...
## $ BIR79 <dbl> 1364, 542, 3616, 830, 1606, 1838, 350, 594, 1190, 20...
## $ SID79 <dbl> 0, 3, 6, 2, 3, 5, 2, 2, 2, 5, 2, 5, 4, 4, 6, 17, 4, ...
## $ NWBIR79 <dbl> 19, 12, 260, 145, 1197, 1237, 139, 371, 844, 176, 59...
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-81.5 36.2,..., MULTIPO...
We have information on each county in the state of North Carolina. To plot using ggplot2, make sure you have the development version of ggplot2 installed using devtools::install_github("tidyverse/ggplot2"). We can use geom_sf() to plot the shapefile:
ggplot() +
geom_sf(data = nc)This produces just the boundaries.
ggmapRather than relying on geodatabases or shapefiles which store boundaries as numeric data, we can use ggmap to retrieve raster map tiles from online mapping services.
library(ggmap)get_googlemap("Saieh Hall For Economics, East 58th Street, Chicago, IL", zoom = 10) %>%
ggmap()This generates a ggplot() which uses the Google Maps image as a raster image, or background you can draw on top of.
leafletLeaflet is an open-source JavaScript library for drawing interactive maps. leaflet is a package for R that allows you to generate Leaflet maps using R functions and data objects. These maps are highly customizable and accept a wide range of data sources, including:
leaflet integrates nicely into the tidyverse in that the map objects (or widgets) are built as a series of piped operations. The basic process is:
leaflet()addTiles, addMarkers, addPolygons)library(leaflet)
m <- leaflet() %>%
setView(lng = -87.596154, lat = 41.790115, zoom = 14)
m %>%
addTiles()As you saw in The Truthful Art, representing portions of the globe on a flat surface can be challenging. Depending on how you project the map, you can distort or emphasize certain features of the map. Fortunately, ggplot() includes the coord_map() function which allows us to easily implement different projection methods.5 Depending on the projection method, you may need to pass additional arguments to coord_map() to define the standard parallel lines used in projecting the map:
ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "gray") +
coord_map() +
ggtitle("Mercator projection (default)")ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "gray") +
coord_map(projection = "albers", lat0 = 25, lat1 = 50) +
ggtitle("Albers equal-area projection")ggplot(data = map_data("world"), mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "gray") +
coord_map(projection = "mollweide", xlim = c(-180, 180)) +
ggtitle("Mollweide projection")Region boundaries serve as the background in geospatial data visualization - so now we need to add data. Some types of channels (points and symbols) are overlaid on top of the boundaries, whereas other channels (fill) are incorporated into the region layer itself. Let’s look at the first set of channels.
Let’s use our usa2 map data to add some points. The airports data frame in the nycflights13 package includes geographic info on airports in the United States.
library(nycflights13)
airports## # A tibble: 1,458 x 8
## faa name lat lon alt tz dst tzone
## <chr> <chr> <dbl> <dbl> <int> <dbl> <chr> <chr>
## 1 04G Lansdowne Airport 41.1 -80.6 1044 -5. A America/New_…
## 2 06A Moton Field Municip… 32.5 -85.7 264 -6. A America/Chic…
## 3 06C Schaumburg Regional 42.0 -88.1 801 -6. A America/Chic…
## 4 06N Randall Airport 41.4 -74.4 523 -5. A America/New_…
## 5 09J Jekyll Island Airpo… 31.1 -81.4 11 -5. A America/New_…
## 6 0A9 Elizabethton Munici… 36.4 -82.2 1593 -5. A America/New_…
## 7 0G6 Williams County Air… 41.5 -84.5 730 -5. A America/New_…
## 8 0G7 Finger Lakes Region… 42.9 -76.8 492 -5. A America/New_…
## 9 0P2 Shoestring Aviation… 39.8 -76.6 1000 -5. U America/New_…
## 10 0S9 Jefferson County In… 48.1 -123. 108 -8. A America/Los_…
## # ... with 1,448 more rows
Each airport has it’s geographic location encoded through lat and lon. To draw these points on the map, basically we draw a scatterplot with x = lon and y = lat. In fact we could simply do that:
ggplot(airports, aes(lon, lat)) +
geom_point()Let’s overlay it with the mapped state borders:
ggplot() +
coord_map() +
geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
color = "black", fill = "gray") +
geom_point(data = airports, aes(x = lon, y = lat), shape = 1)Slight problem. We have airports listed outside of the continental United States. There are a couple ways to rectify this. Unfortunately airports does not include a variable identifying state so the filter() operation is not that simple. The easiest solution is to crop the limits of the graph to only show the mainland:
ggplot() +
coord_map(xlim = c(-130, -60),
ylim = c(20, 50)) +
geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
color = "black", fill = "gray") +
geom_point(data = airports, aes(x = lon, y = lat), shape = 1)If we want to change the projection method, the points will automatically adjust too:
ggplot() +
coord_map(projection = "albers", lat0 = 25, lat1 = 50,
xlim = c(-130, -60),
ylim = c(20, 50)) +
geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
color = "black", fill = "gray") +
geom_point(data = airports, aes(x = lon, y = lat), shape = 1)We can change the size or type of symbols on the map. For instance, we can draw a bubble plot (also known as a proportional symbol map) and encode the altitude of the airport through the size channel:
ggplot() +
coord_map(xlim = c(-130, -60),
ylim = c(20, 50)) +
geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
color = "black", fill = "white") +
geom_point(data = airports, aes(x = lon, y = lat, size = alt),
fill = "grey", color = "black", alpha = .2) +
theme_void() +
theme(legend.position = "none")Circle area is proportional to the airport’s altitude (in feet). Or we could scale it based on the number of arriving flights in flights:
airports_n <- flights %>%
count(dest) %>%
left_join(airports, by = c("dest" = "faa"))
ggplot() +
coord_map(xlim = c(-130, -60),
ylim = c(20, 50)) +
geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
color = "black", fill = "white") +
geom_point(data = airports_n, aes(x = lon, y = lat, size = n),
fill = "grey", color = "black", alpha = .2) +
theme_void() +
theme(legend.position = "none")
airportscontains a list of virtually all commercial airports in the United States. Howeverflightsonly contains data on flights departing from New York City airports (JFK, LaGuardia, or Newark) and only services a few airports around the country.
Choropleth maps encode information by assigning shades of colors to defined areas on a map (e.g. countries, states, counties, zip codes). There are lots of ways to tweak and customize these graphs, which is generally a good idea because remember that color is one of the harder-to-decode channels.
We’ll continue to use the usa2 shapefile. Let’s reload it and also load and tidy the county shapefile:
usa <- readOGR("data/census_bureau/cb_2013_us_state_20m/cb_2013_us_state_20m.shp")## OGR data source with driver: ESRI Shapefile
## Source: "/Users/soltoffbc/Projects/Data Visualization/dataviz/notes/data/census_bureau/cb_2013_us_state_20m/cb_2013_us_state_20m.shp", layer: "cb_2013_us_state_20m"
## with 52 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
usa2 <- usa %>%
fortify(region = "GEOID") %>%
as_tibble %>%
left_join(usa@data, by = c("id" = "GEOID")) %>%
# filter out Alaska, Hawaii, Puerto Rico via FIPS codes
filter(!(STATEFP %in% c("02", "15", "72")))
counties <- readOGR("data/census_bureau/cb_2013_us_county_20m/cb_2013_us_county_20m.shp")## OGR data source with driver: ESRI Shapefile
## Source: "/Users/soltoffbc/Projects/Data Visualization/dataviz/notes/data/census_bureau/cb_2013_us_county_20m/cb_2013_us_county_20m.shp", layer: "cb_2013_us_county_20m"
## with 3221 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
counties2 <- counties %>%
fortify(region = "GEOID") %>%
as_tibble %>%
left_join(counties@data, by = c("id" = "GEOID")) %>%
# filter out Alaska, Hawaii, Puerto Rico via FIPS codes
filter(!(STATEFP %in% c("02", "15", "72")))
ggplot(counties2, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "gray") +
coord_map()We’ll draw choropleths for the number of foreign-born individuals in each region (state or county). We can get those files from the census_bureau folder. Let’s also normalize our measure by the total population to get the rate of foreign-born individuals in the population:
(fb_state <- read_csv("data/census_bureau/ACS_13_5YR_B05012_state/ACS_13_5YR_B05012.csv") %>%
mutate(rate = HD01_VD03 / HD01_VD01))## # A tibble: 51 x 10
## GEO.id GEO.id2 `GEO.display-la… HD01_VD01 HD02_VD01 HD01_VD02 HD02_VD02
## <chr> <chr> <chr> <int> <chr> <int> <int>
## 1 04000… 01 Alabama 4799277 <NA> 4631045 2881
## 2 04000… 02 Alaska 720316 <NA> 669941 1262
## 3 04000… 04 Arizona 6479703 <NA> 5609835 7725
## 4 04000… 05 Arkansas 2933369 <NA> 2799972 2568
## 5 04000… 06 California 37659181 <NA> 27483342 30666
## 6 04000… 08 Colorado 5119329 <NA> 4623809 5778
## 7 04000… 09 Connecticut 3583561 <NA> 3096374 5553
## 8 04000… 10 Delaware 908446 <NA> 831683 2039
## 9 04000… 11 District of Col… 619371 <NA> 534142 2017
## 10 04000… 12 Florida 19091156 <NA> 15392410 16848
## # ... with 41 more rows, and 3 more variables: HD01_VD03 <int>,
## # HD02_VD03 <int>, rate <dbl>
(fb_county <- read_csv("data/census_bureau/ACS_13_5YR_B05012_county/ACS_13_5YR_B05012.csv") %>%
mutate(rate = HD01_VD03 / HD01_VD01))## # A tibble: 3,143 x 10
## GEO.id GEO.id2 `GEO.display-la… HD01_VD01 HD02_VD01 HD01_VD02 HD02_VD02
## <chr> <chr> <chr> <int> <int> <int> <int>
## 1 05000… 01001 Autauga County,… 54907 NA 54019 277
## 2 05000… 01003 Baldwin County,… 187114 NA 180308 802
## 3 05000… 01005 Barbour County,… 27321 NA 26517 145
## 4 05000… 01007 Bibb County, Al… 22754 NA 22486 86
## 5 05000… 01009 Blount County, … 57623 NA 55122 262
## 6 05000… 01011 Bullock County,… 10746 NA 10170 484
## 7 05000… 01013 Butler County, … 20624 NA 20467 91
## 8 05000… 01015 Calhoun County,… 117714 NA 114892 392
## 9 05000… 01017 Chambers County… 34145 NA 33786 101
## 10 05000… 01019 Cherokee County… 26034 NA 25849 106
## # ... with 3,133 more rows, and 3 more variables: HD01_VD03 <int>,
## # HD02_VD03 <int>, rate <dbl>
Now that we have our data, we want to draw it on the map. To do that, we have to join together our data sources - the shapefiles and the CSVs. Normally joining data files requires a _join() operation of some sort. However when using ggplot2, we don’t have to do this. Remember that we can pass different data frames into different layers of a ggplot() object. Rather than using geom_polygon() to draw our maps, now we switch to geom_map():
ggplot(fb_state, aes(map_id = GEO.id2)) +
geom_map(aes(fill = rate), map = usa2) +
expand_limits(x = usa2$long, y = usa2$lat)Let’s break down what just happened:
fb_state is the data frame with the variables we want to visualizemap_id = GEO.id2 identifies the column in fb_state that uniquely matches each observation to a region in usa2geom_map(aes(fill = rate), map = usa2)
fill = rate identifies the column in fb_state we will use to determine the color of each regionmap = usa2 is the data frame containing the boundary coordinatesexpand_limits(x = usa2$long, y = usa2$lat) ensures the graph is drawn to the proper window. Because the default data frame for this ggplot() object is fb_state, it won’t contain the necessary information to size the windowWe can then tweak this up by adding a title, removing the background (but retaining the legend), and projecting the map using a different method:
ggplot(fb_state, aes(map_id = GEO.id2)) +
geom_map(aes(fill = rate), map = usa2) +
expand_limits(x = usa2$long, y = usa2$lat) +
scale_fill_continuous(labels = scales::percent) +
labs(title = "Rate of foreign-born individuals in the population",
fill = NULL) +
ggthemes::theme_map() +
coord_map(projection = "albers", lat0 = 25, lat1 = 50)We could do the same thing for the county-level data:
ggplot(fb_county, aes(map_id = GEO.id2)) +
geom_map(aes(fill = rate), map = counties2) +
expand_limits(x = counties2$long, y = counties2$lat) +
scale_fill_continuous(labels = scales::percent) +
labs(title = "Rate of foreign-born individuals in the population",
fill = NULL) +
ggthemes::theme_map() +
coord_map(projection = "albers", lat0 = 25, lat1 = 50)sfIf we want to plot feature attributes, the attributes are already joined with the geometric features. Just use the fill aesthetic:
ggplot() +
geom_sf(data = nc, aes(fill = BIR74))Because nc is a data frame, we can also apply basic reshaping functions to it. For instance, what if we want a facet plot of the SID rate in 1974 and 1979?
(nc2 <- nc %>%
select(SID74, SID79, geometry) %>%
gather(VAR, SID, -geometry))## Simple feature collection with 200 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -84.3 ymin: 33.9 xmax: -75.5 ymax: 36.6
## epsg (SRID): 4267
## proj4string: +proj=longlat +datum=NAD27 +no_defs
## First 10 features:
## VAR SID geometry
## 1 SID74 1 MULTIPOLYGON (((-81.5 36.2,...
## 2 SID74 0 MULTIPOLYGON (((-81.2 36.4,...
## 3 SID74 5 MULTIPOLYGON (((-80.5 36.2,...
## 4 SID74 1 MULTIPOLYGON (((-76 36.3, -...
## 5 SID74 9 MULTIPOLYGON (((-77.2 36.2,...
## 6 SID74 7 MULTIPOLYGON (((-76.7 36.2,...
## 7 SID74 0 MULTIPOLYGON (((-76 36.3, -...
## 8 SID74 0 MULTIPOLYGON (((-76.6 36.3,...
## 9 SID74 4 MULTIPOLYGON (((-78.3 36.3,...
## 10 SID74 1 MULTIPOLYGON (((-80 36.3, -...
ggplot() +
geom_sf(data = nc2, aes(fill = SID)) +
facet_wrap(~ VAR, ncol = 1)Recall that Cleveland and McGill identify the color channel as one of the most difficult channels for humans to properly decode and interpret. Selection of your color palette is perhaps the most important decision to make when drawing a choropleth.
By default, ggplot2 picks evenly spaced hues around the Hue-Chroma-Luminance (HCL) color space:6
# generate simulated data points
sim_points <- data_frame(x = factor(1:6))
plots <- purrr::map(1:6, ~ ggplot(sim_points[1:.x, ], aes(x, x, color = x)) +
geom_point(size = 5) +
ggtitle(paste(.x, "color")) +
theme(legend.position = "none"))
gridExtra::marrangeGrob(plots, nrow = 2, ncol = 3, top = NULL)ggplot2 gives you many different ways of defining and customizing your scale_color_ and scale_fill_ palettes, but will not tell you if they are optimal for your specific usage in the graph.
Color Brewer is a diagnostic tool for selecting optimal color palettes for maps with discrete variables. The authors have generated different color palettes designed to make differentiating between categories easy depending on the scaling of your variable. All you need to do is define the number of categories in the variable, the nature of your data (sequential, diverging, or qualitative), and a color scheme. There are also options to select palettes that are colorblind safe, print friendly, and photocopy safe. Depending on the combination of options, you may not find any color palette that matches your criteria. In such a case, consider reducing the number of data classes.
fb_county %>%
mutate(rate_cut = cut_number(rate, 6)) %>%
ggplot(aes(map_id = GEO.id2)) +
geom_map(aes(fill = rate_cut), map = counties2) +
expand_limits(x = counties2$long, y = counties2$lat) +
labs(title = "Rate of foreign-born individuals in the population",
fill = NULL) +
ggthemes::theme_map() +
coord_map(projection = "albers", lat0 = 25, lat1 = 50) +
scale_fill_brewer(palette = "BuGn")fb_county %>%
mutate(rate_cut = cut_number(rate, 6)) %>%
ggplot(aes(map_id = GEO.id2)) +
geom_map(aes(fill = rate_cut), map = counties2) +
expand_limits(x = counties2$long, y = counties2$lat) +
labs(title = "Rate of foreign-born individuals in the population",
fill = NULL) +
ggthemes::theme_map() +
coord_map(projection = "albers", lat0 = 25, lat1 = 50) +
scale_fill_brewer(palette = "YlGn")fb_county %>%
mutate(rate_cut = cut_number(rate, 6)) %>%
ggplot(aes(map_id = GEO.id2)) +
geom_map(aes(fill = rate_cut), map = counties2) +
expand_limits(x = counties2$long, y = counties2$lat) +
labs(title = "Rate of foreign-born individuals in the population",
fill = NULL) +
ggthemes::theme_map() +
coord_map(projection = "albers", lat0 = 25, lat1 = 50) +
scale_fill_brewer(palette = "Blues")state_data <- data_frame(name = state.name,
region = state.region,
subregion = state.division,
abb = state.abb) %>%
bind_cols(as_tibble(state.x77)) %>%
# get id variable into data frame
left_join(usa2 %>%
select(id, NAME) %>%
distinct,
by = c("name" = "NAME")) %>%
# remove Alaska and Hawaii
na.omit
# set region base plot
region_p <- ggplot(state_data, aes(map_id = id)) +
geom_map(aes(fill = region), map = usa2) +
expand_limits(x = usa2$long, y = usa2$lat) +
labs(fill = NULL) +
ggthemes::theme_map() +
coord_map(projection = "albers", lat0 = 25, lat1 = 50)
region_p# try different color brewers
region_p +
scale_fill_brewer(palette = "Paired")region_p +
scale_fill_brewer(palette = "Dark2")region_p +
scale_fill_brewer(palette = "Pastel2")# set subregion base plot
subregion_p <- ggplot(state_data, aes(map_id = id)) +
geom_map(aes(fill = subregion), map = usa2) +
expand_limits(x = usa2$long, y = usa2$lat) +
labs(fill = NULL) +
ggthemes::theme_map() +
coord_map(projection = "albers", lat0 = 25, lat1 = 50)
subregion_psubregion_p +
scale_fill_brewer(palette = "Paired")subregion_p +
scale_fill_brewer(palette = "Set1")subregion_p +
scale_fill_brewer(palette = "Pastel1")devtools::session_info()## setting value
## version R version 3.4.3 (2017-11-30)
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/Chicago
## date 2018-05-22
##
## package * version date source
## assertthat 0.2.0 2017-04-11 CRAN (R 3.4.0)
## backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
## base * 3.4.3 2017-12-07 local
## bindr 0.1.1 2018-03-13 CRAN (R 3.4.3)
## bindrcpp * 0.2.2 2018-03-29 CRAN (R 3.4.4)
## broom * 0.4.4 2018-03-29 CRAN (R 3.4.3)
## cellranger 1.1.0 2016-07-27 CRAN (R 3.4.0)
## class 7.3-14 2015-08-30 CRAN (R 3.4.3)
## classInt 0.1-24 2017-04-16 CRAN (R 3.4.0)
## cli 1.0.0 2017-11-05 CRAN (R 3.4.2)
## codetools 0.2-15 2016-10-05 CRAN (R 3.4.3)
## colorspace 1.3-2 2016-12-14 CRAN (R 3.4.0)
## compiler 3.4.3 2017-12-07 local
## crayon 1.3.4 2017-10-03 Github (gaborcsardi/crayon@b5221ab)
## crosstalk 1.0.0 2016-12-21 CRAN (R 3.4.0)
## data.table 1.10.4-3 2017-10-27 CRAN (R 3.4.2)
## datasets * 3.4.3 2017-12-07 local
## DBI 0.8 2018-03-02 CRAN (R 3.4.3)
## devtools 1.13.5 2018-02-18 CRAN (R 3.4.3)
## digest 0.6.15 2018-01-28 CRAN (R 3.4.3)
## dplyr * 0.7.4 2017-09-28 CRAN (R 3.4.2)
## e1071 1.6-8 2017-02-02 CRAN (R 3.4.0)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
## fiftystater * 1.0.1 2016-11-14 CRAN (R 3.4.0)
## forcats * 0.3.0 2018-02-19 CRAN (R 3.4.3)
## foreign 0.8-69 2017-06-22 CRAN (R 3.4.3)
## gapminder * 0.3.0 2017-10-31 CRAN (R 3.4.2)
## geosphere 1.5-7 2017-11-05 CRAN (R 3.4.2)
## ggmap * 2.6.1 2016-01-23 CRAN (R 3.4.0)
## ggplot2 * 2.2.1.9000 2018-05-18 Github (tidyverse/ggplot2@54de616)
## glue 1.2.0 2017-10-29 CRAN (R 3.4.2)
## graphics * 3.4.3 2017-12-07 local
## grDevices * 3.4.3 2017-12-07 local
## grid 3.4.3 2017-12-07 local
## gtable 0.2.0 2016-02-26 CRAN (R 3.4.0)
## haven 1.1.1 2018-01-18 CRAN (R 3.4.3)
## hms 0.4.2 2018-03-10 CRAN (R 3.4.3)
## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
## htmlwidgets 1.2 2018-04-19 cran (@1.2)
## httpuv 1.3.6.2 2018-03-02 CRAN (R 3.4.3)
## httr 1.3.1 2017-08-20 CRAN (R 3.4.1)
## jpeg 0.1-8 2014-01-23 CRAN (R 3.4.0)
## jsonlite 1.5 2017-06-01 CRAN (R 3.4.0)
## knitr * 1.20 2018-02-20 CRAN (R 3.4.3)
## labeling 0.3 2014-08-23 CRAN (R 3.4.0)
## lattice 0.20-35 2017-03-25 CRAN (R 3.4.3)
## lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.2)
## leaflet * 1.1.0 2017-02-21 CRAN (R 3.4.0)
## lubridate 1.7.4 2018-04-11 CRAN (R 3.4.3)
## magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
## mapproj 1.2.6 2018-03-29 CRAN (R 3.4.4)
## maps * 3.3.0 2018-04-03 CRAN (R 3.4.4)
## maptools * 0.9-2 2017-03-25 CRAN (R 3.4.0)
## memoise 1.1.0 2017-04-21 CRAN (R 3.4.0)
## methods * 3.4.3 2017-12-07 local
## mime 0.5 2016-07-07 CRAN (R 3.4.0)
## mnormt 1.5-5 2016-10-15 CRAN (R 3.4.0)
## modelr * 0.1.1 2017-08-10 local
## munsell 0.4.3 2016-02-13 CRAN (R 3.4.0)
## nlme 3.1-137 2018-04-07 CRAN (R 3.4.4)
## nycflights13 * 0.2.2 2017-01-27 CRAN (R 3.4.0)
## parallel 3.4.3 2017-12-07 local
## pillar 1.2.1 2018-02-27 CRAN (R 3.4.3)
## pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0)
## plotly * 4.7.1.9000 2018-05-17 Github (ropensci/plotly@efe5535)
## plyr 1.8.4 2016-06-08 CRAN (R 3.4.0)
## png 0.1-7 2013-12-03 CRAN (R 3.4.0)
## proto 1.0.0 2016-10-29 CRAN (R 3.4.0)
## psych 1.8.3.3 2018-03-30 CRAN (R 3.4.4)
## purrr * 0.2.4 2017-10-18 CRAN (R 3.4.2)
## R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
## Rcpp 0.12.16 2018-03-13 CRAN (R 3.4.4)
## readr * 1.1.1 2017-05-16 CRAN (R 3.4.0)
## readxl 1.0.0 2017-04-18 CRAN (R 3.4.0)
## reshape2 1.4.3 2017-12-11 CRAN (R 3.4.3)
## rgdal * 1.2-18 2018-03-17 CRAN (R 3.4.4)
## rgeos * 0.3-26 2017-10-31 CRAN (R 3.4.2)
## RgoogleMaps 1.4.1 2016-09-18 CRAN (R 3.4.0)
## rjson 0.2.15 2014-11-03 CRAN (R 3.4.0)
## rlang 0.2.0.9001 2018-05-18 Github (r-lib/rlang@854174a)
## rmarkdown 1.9 2018-03-01 CRAN (R 3.4.3)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3)
## rstudioapi 0.7 2017-09-07 CRAN (R 3.4.1)
## rvest 0.3.2 2016-06-17 CRAN (R 3.4.0)
## scales 0.5.0.9000 2018-05-18 Github (hadley/scales@d767915)
## sf * 0.6-1 2018-03-22 CRAN (R 3.4.4)
## shiny 1.0.5 2017-08-23 cran (@1.0.5)
## sp * 1.2-7 2018-01-19 CRAN (R 3.4.3)
## stats * 3.4.3 2017-12-07 local
## stringi 1.1.7 2018-03-12 CRAN (R 3.4.3)
## stringr * 1.3.0 2018-02-19 CRAN (R 3.4.3)
## tibble * 1.4.2 2018-01-22 CRAN (R 3.4.3)
## tidyr * 0.8.0 2018-01-29 CRAN (R 3.4.3)
## tidyverse * 1.2.1 2017-11-14 CRAN (R 3.4.2)
## tools 3.4.3 2017-12-07 local
## udunits2 0.13 2016-11-17 CRAN (R 3.4.0)
## units 0.5-1 2018-01-08 CRAN (R 3.4.3)
## utils * 3.4.3 2017-12-07 local
## viridisLite 0.3.0 2018-02-01 CRAN (R 3.4.3)
## withr 2.1.2 2018-05-18 Github (jimhester/withr@79d7b0d)
## xml2 1.2.0 2018-01-24 CRAN (R 3.4.3)
## xtable 1.8-2 2016-02-05 CRAN (R 3.4.0)
## yaml 2.1.18 2018-03-08 CRAN (R 3.4.4)
See chapter 10 in The Truthful Art for a more detailed introduction to these features.↩
Originally obtained from the Census Bureau.↩
Or as we’ll see shortly, apply a map projection to the graph.↩
This function replaces coord_fixed().↩
Check out chapter 6.6.2 in ggplot2: Elegant Graphics for Data Analysis for a much more thorough explanation of the theory behind this selection process↩